7 research outputs found
Empirical analysis of representation learning and exploration in neural kernel bandits
Neural bandits have been shown to provide an efficient solution to practical
sequential decision tasks that have nonlinear reward functions. The main
contributor to that success is approximate Bayesian inference, which enables
neural network (NN) training with uncertainty estimates. However, Bayesian NNs
often suffer from a prohibitive computational overhead or operate on a subset
of parameters. Alternatively, certain classes of infinite neural networks were
shown to directly correspond to Gaussian processes (GP) with neural kernels
(NK). NK-GPs provide accurate uncertainty estimates and can be trained faster
than most Bayesian NNs. We propose to guide common bandit policies with NK
distributions and show that NK bandits achieve state-of-the-art performance on
nonlinear structured data. Moreover, we propose a framework for measuring
independently the ability of a bandit algorithm to learn representations and
explore, and use it to analyze the impact of NK distributions w.r.t.~those two
aspects. We consider policies based on a GP and a Student's t-process (TP).
Furthermore, we study practical considerations, such as training frequency and
model partitioning. We believe our work will help better understand the impact
of utilizing NKs in applied settings.Comment: Extended version. Added a major experiment comparing NK distribution
w.r.t. exploration and exploitation. Submitted to ICLR 202
Federated Training of Dual Encoding Models on Small Non-IID Client Datasets
Dual encoding models that encode a pair of inputs are widely used for
representation learning. Many approaches train dual encoding models by
maximizing agreement between pairs of encodings on centralized training data.
However, in many scenarios, datasets are inherently decentralized across many
clients (user devices or organizations) due to privacy concerns, motivating
federated learning. In this work, we focus on federated training of dual
encoding models on decentralized data composed of many small, non-IID
(independent and identically distributed) client datasets. We show that
existing approaches that work well in centralized settings perform poorly when
naively adapted to this setting using federated averaging. We observe that, we
can simulate large-batch loss computation on individual clients for loss
functions that are based on encoding statistics. Based on this insight, we
propose a novel federated training approach, Distributed Cross Correlation
Optimization (DCCO), which trains dual encoding models using encoding
statistics aggregated across clients, without sharing individual data samples.
Our experimental results on two datasets demonstrate that the proposed DCCO
approach outperforms federated variants of existing approaches by a large
margin.Comment: ICLR 2023 Workshop on Pitfalls of Limited Data and Computation for
Trustworthy M
BERT for Long Documents: A Case Study of Automated ICD Coding
Transformer models have achieved great success across many NLP problems.
However, previous studies in automated ICD coding concluded that these models
fail to outperform some of the earlier solutions such as CNN-based models. In
this paper we challenge this conclusion. We present a simple and scalable
method to process long text with the existing transformer models such as BERT.
We show that this method significantly improves the previous results reported
for transformer models in ICD coding, and is able to outperform one of the
prominent CNN-based methods
Sample adaptive multiple kernel learning for failure prediction of railway points
© 2019 Association for Computing Machinery. Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Meanwhile, they are also one of the most fragile parts in railway systems. Points failures cause a large portion of railway incidents. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipment failures. Instead, it would be of great value if we could forecast points' failures and take action beforehand, min-imising any negative effect. To date, most of the existing prediction methods are either lab-based or relying on specially installed sensors which makes them infeasible for large-scale implementation. Besides, they often use data from only one source. We, therefore, explore a new way that integrates multi-source data which are ready to hand to fulfil this task. We conducted our case study based on Sydney Trains rail network which is an extensive network of passenger and freight railways. Unfortunately, the real-world data are usually incomplete due to various reasons, e.g., faults in the database, operational errors or transmission faults. Besides, railway points differ in their locations, types and some other properties, which means it is hard to use a unified model to predict their failures. Aiming at this challenging task, we firstly constructed a dataset from multiple sources and selected key features with the help of domain experts. In this paper, we formulate our prediction task as a multiple kernel learning problem with missing kernels. We present a robust multiple kernel learning algorithm for predicting points failures. Our model takes into account the missing pattern of data as well as the inherent variance on different sets of railway points. Extensive experiments demonstrate the superiority of our algorithm compared with other state-of-the-art methods